Search Result

Select

Multi-label lazy learning approach based on firefly method

CHENG Yusheng, QIAN Kun, WANG Yibing, ZHAO Dawei

Journal of Computer Applications 2019, 39 (5): 1305-1311. DOI: 10.11772/j.issn.1001-9081.2018109182

Abstract （515）

PDF （1074KB）（308）

Save

The existing Improved Multi-label Lazy Learning Approach (IMLLA) has the problem that the influence of similarity information is ignored with only the neighbor label correlation information considered when the neighbor labels were used, which may reduce the robustness of the approach. To solve this problem, with firefly method introduced and the combination of similarity information with label information, a Multi-label Lazy Learning Approach based on FireFly method (FF-MLLA) was proposed. Firstly, Minkowski distance was used to measure the similarity between samples to find the neighbor point. Secondly, the label count vector was improved by combining the neighbor point and firefly method. Finally, Singular Value Decomposition (SVD) and kernel Extreme Learning Machine (ELM) were used to realize linear classification. The robustness of the approach was improved due to considering both label information and similarity information. The experimental results demonstrate that the proposed approach improves the classification performance to a great extent compared to other multi-label learning approaches. And the statistical hypothesis testing and stability analysis are used to further illustrate the rationality and effectiveness of the proposed approach.

Reference | Related Articles | Metrics

Select

Feature selection for multi-label distribution learning with streaming data based on rough set

CHENG Yusheng, CHEN Fei, WANG Yibin

Journal of Computer Applications 2018, 38 (11): 3105-3111. DOI: 10.11772/j.issn.1001-9081.2018041275

Abstract （478）

PDF （1135KB）（422）

Save

Traditional feature selection algorithm cannot process streaming feature data, the redundancy calculation is complicated and the description of the instance is not accurate enough. A multi-label Distribution learning Feature Selection with Streaming Data Using Rough Set (FSSRS) was proposed to solve the above problem. Firstly, the online streaming feature selection framework was introduced into multi-label learning. Secondly, the original conditional probability was replaced by the dependency in rough set theory, which made the streaming data feature selection algorithm more efficient and faster than before by only using the information calculation of the data itself. Finally, since each label has a different degree of description for the same instance in real world, to make the description of the instance more accurate, label distribution was used to instead of traditional logical labels. The experimental results show that the proposed algorithm can retain the features with high correlation with the label space, so that the classification accuracy is improved to a certain extent compared with that without feature selection.

Reference | Related Articles | Metrics

Select

Text semantic classification algorithm based on risk decision

CHENG Yusheng, LIANG Hui, WANG Yibin, LI Kang

Journal of Computer Applications 2016, 36 (11): 2963-2968. DOI: 10.11772/j.issn.1001-9081.2016.11.2963

Abstract （497）

PDF （967KB）（463）

Save

Most of traditional text classification algorithms are based on vector space model and hierarchical classification tree model is used for statistical analysis. The model mostly doesn't combine with the semantic information of characteristic items. Therefore it may produce a large number of frequent semantic modes and increase the paths of classification. Combining with the good distinguishment characteristic of essential Emerging Pattern (eEP) in the classification and the model of rough set based on minimum expected risk decision, a Text Semantic Classification algorithm with Threshold Optimization (TSCTO) was presented. Firstly, after obtaining the document feature frequency distribution table, the minimum threshold value was calculated by the rough set combined with distribution density matrix. Then the high frequency words of the semantic intra-class document frequency are obtained by combining semantic analysis and inverse document frequency method. In order to get the simplest model, the eEP pattern was used for classification. Finally, using similarity formula and HowNet semantic relevance degree, the score of text similarity was calculated, and some thresholds were optimized by the three-way decision theory. The experimental results show that the TSCTO algorithm has a certain improvement in the performance of text classification.

Reference | Related Articles | Metrics